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(54) Title: SOFTWARE WATERMARKING TECHNIQUES 
(57) Abstract 

A metiiod of watennarking a software object 
whereby a watemnark is stored in the state of the 
software object as it is being nm witfi a particular 
input sequence. Further disclosed is a method of 
watumaiking software including the steps of: cm- 
bedding a watemiaik in a static string and applying 
an obfuscation technique whereby this static string 
is converted into executable code. Also disclosed 
is a metiMxl of verifying die integrity or origin of a 
program comprising embedding a vmtermaik in the 
state of a {HOgram as ^program is being nm with a 
particular input sequence buQding a recognizer con- 
currentiy with the input and watennait wherein die 
recognizer is adapted to extract die watennark graph 
from other dynamic structures on the heap or stack 
wherein the recognizer is kept st^iarately from the 
program; wherein it is adapted to check for a number 
n. n, in a preferred embodiment, bcfaig the product of 
two primes and wherein n is embedded in the topol- 
ogy of the watennark. Further disclosed is a method 
of watermarking software wherein the watennark is 
chosen from a class of graphs wherein each member 
has one or more properties, such as planarity. said 
property being capable of being tested by integrity 
testing software. 
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SOFTWARE WATERMARKING TECHNIQUES 
FIELD OF THE INVENTION 

The present invention relates to methods for protecting software against theft, 
5 estat>lishing/proving ownership of software and validating software. More particularly, 
although not exclusively, the present invention provides for methods for 
watenmarking what will be generically referred to as software objects. In this context, 
software objects may be understood to include programs and certain types of media. 



10 BACKGROUND TO THE INVENTION 

Watermarking is the process of embedding a secret message, the watermark, into a 
cover or overt message. For example, in media watemnarking, the secret is 
commonly a copyright notice and the cover is a digital image, video or audio 
recording. Rngerprinting is a method whereby each individual software application 

15 incorporates a, potentially, unique, watermark which allows that particular example of 
the software to be identified. Rngerprinting may be viewed as a multiple use of 
watermarking techniques. 



The watermark is constructed to make it difficult to remove the watennark without 
20 damaging the software object in which it is embedded. Such watermarics may only 
be removed safely by someone (or some process) in possession of one or more 
secrets that were employed while constructing the watermark. 



Watennarking a software object (hereafter referred to as an object) discourages 
25 intellectual property theft. A lurther application ts that watermarking an object can be 
used to establish and/or prove evidence of ownership of an object Rngerprinting is 
similar to watermaridng except a different watermaric is embedded in every cover 
message thus providing a unique fingerprint for every object Watermarking is 
therefore a subset of fingerprinting and the latter may be used to detect not only the 
30 fact that a theft has occurred, but may also altow identification of the particular object 
and thus establish an audit trail which can be used to reveal the infringer of 
copyright. 

In the context of prior art watennartc techniques, the foil wing scenari senses to 
35 illustrat the ways in which a watennariced bj ct may be vulnerable to attack. With 
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reference to figure 1, suppose that A watermarks an object O with a watermark W 
and key K. If the object O is sold to B and B wishes to (illegally) on-sell O to C. there 
are various types f attadc to which O may be vulnerable. 

5 Detection: initially B must try and detect the presence of the watermark in O. If there 
is no watemnark, no further action is necessary. 

Locate and remove: once B has detennined that O canies a watermark, B may try to 
locate and remove W without otherwise harming the rest of the corrtents of O. 

10 

Distort: if some degradation in quality of O is acceptable, B may distort it sufficiently 
so that it becomes impossible for A to detect the presence of the watermark W in the 
object O. 

15 Add: alternatively, if removing the watermark W is too difficult, or distorting the object 
O is not acceptable, B might simply add his own watemiark W (or several such 
marks) to the object O. This way, A's mark becomes just one of many. 

It is considered that most media watemnarking schemes are vulnerable to attack by 
20 distortion. For example, image transforms such as cropping and lossy compression 
will distort the image sufficiently to render many watemiarks unrecoverable. 

To the knowledge of the applicants there exists no effective watermarking scheme 
which is capable of use with or appropriate for software. It would be a significant 
25 advantage to be aiAe to apply watermarking techniques to soflwrare in view of the 
wkiespread occurrence of software piracy. It is estimated at software piracy costs 
approximately 15 billion dollars per year. Thus the problem of software security and 
protection is of significant commercial importance. 

30 One simple way. known in the prior art, of eml)edding a watermark in a piece of 
software is simply to include it in the initialized static data section of the object code. 
In a similar, yet more complex manner, watenmartcs are often encoded in what is 
known as an "Easter egg". This is a piece of code, which is activated for a highly 
unusual or seldom ncountered input to the [articular applicati n, which displays a 
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wat rmark image, plays a watermark sound, or. in som way, alerts the us r that the . 
watermark mde has been activated. 



Thus, it is an object of the present invention to provide methods for watermarking 
5 software objects which overcomes the limitations inherent in prior art watemiarking 
techniques and allows for non-media objects to be watermarked effectively. It is a 
further object of the present invention to provide methods for watennarking software 
objects which are resistant to the aforementioned techniques for attacking watemiark 
objects or to at least provide the public witii a useful choice. 

10 

DISCLOSURE OF THE INVENTION 

In one aspect, the invention provides for a method of watennarking a software 
object whereby a watemiark is stored in the state of the software object as it is being 
run with a particular input sequence. 

15 

The software object may be a program or piece of program. 

The state of the software object may correspond to tiie cunent values held In the 
stadc, heap, global variables, registers, program counter and tiie like. 

20 

In a preferred embodiment, the watennaric may be stored in an object's execution 
state whereby a (possibly empty) input sequence / is constructed which, when fed to 
an application of which the object is a part will make the object O enter a state which 
represents the watermaric, tiie representation being validated or checked by 
25 examining the dynamically allocated data structures of the object O. 

In an altemative eml>odiment, the watermark could be embedded in the execution 
trace of the object O whereby, as a spedal input / is fed to O, the address/operator 
trace is monitored and. based on a property of the trace, a watermark is extracted. 

30 

In a preferred embodiment, the watermark is embedded in tiie state of the program 
as it is being run witii a particular input sequence Mi.., 



The watermaric may be embedded in th topology of a dynamically built graph 
35 structure. 
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Ih graph staicture ( r watermark graph) corresponds to a represerrtation of the 
data structure of the program and may be viewed as a set of nodes together with a 
set of vertices. 

5 

The method may further comprise buildi ng a recognizer R concunrently with the^ 
input / and watennark W. 

Preferably R is a function adapted to identify and extract the watemiark graph from 
10 all other dynamically allocated data staictures. 

In an alternative, less prefen^ embodiment, the watennark W may incorporate a 
marker that will allow R to recognize it easily, 

15 In a preferred embodiment R is retained separately from the program whereby R is 
dynamically linked with the program when it is checked for the existence of a 
watermark. 

Preferably the application of which the object forms a part is obfuscated or 
20 incorporates tamper-proofing code. 

In a prefened embodiment R extracts a value n from the topology of the graph 
comprising the watermark W. 

25 The watennark W has a sig nature property s w here s(W) evaluates to 'Irue'' if the 
watermaric IV is recognisat>le wherein the recogniser R tests a presumed watemiark 
W by evaluating the signature property s(W), 

30 In a preferred embodiment, the method includes the creation of a number n which 
may be embedded in the topology of a watermark graph, wherein the signature 
property s(W) is a function of a number n so embedded. 

In a prefenred mbodim nt th signature property s(W) is "tme" if and only if th 
35 number n is the product of tw prim s. 
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Th invention further provides for a method of verifying the integrity or origin of a 
program comprising: 

embedding a watennark IV in the state of a program as tiie program is being run 

5 with a particular input sequence /; 

building a recognizer R concurrentiy with the input / and watennark IV wherein the 
recognizer is adapted to extract the watermaric graph fi^om otiier dynamically 
allocated data structures wherein R is kept separately from the program; wherein R 
is adapted to check for a number n, n, in a preferred embodiment, bemg the product 

10 of two primes and wherein n is embedded in the topology of W. 

Other properties of IV may be used to compute the signature. 

The numl>er n may be derived from any combination of numbers depending on the 
15 context and application. 

Preferably the program or code is further adapted to be resistant to tampering, 
preferably by means of obfuscation or by adding tamper-proofing code. 

20 Preferably the watemiarics W are chosen from a dass of graphs G wherein each 
member of G has one or more properties, such as planarity. said property being 
capable of being tested by integrity-testing software. 

In an alternative embodiment, the watermark may be rendered tamperproof to 
25 certain transformations, such as attacks, by expanding each node of tiie watennaric 
graph into a >cyde. where j may be any number, in a prefenBd embodiment, a small 
number from 1 to 5. 

In a broad aspect, the recognizer R checks for the effect of the watennaridng code 
30 on the execution state of the application thereby preserving the ability to recognize 
the watenmaric in cases where semantics-preserving transformations have been 
applied to the application. 



35 



In a further aspect, the invention provkles for a m thod of wat nmartdng software 
induding the steps of: 
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emb^ding a watennarlc in a static string, then applying an obfuscation 

te<*iniqij» whereby this static string is converted into xecutable code. 

The executable code is called whenever the static string is required by the program. 

5 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described by way of example only and with 
10 reference to the figures in which: 

Figure 1: illustrates methods of adding a watemnaric to an object and attacking 
the integrity of such a watermark; 

15 Figure 2: illustrates methods of embedding a watermaric in a program; 

Rgure 3: illustrates an example of a function used to embed a vratermark within 
a static string; 

20 Figure 4: illustrates insertion of a bogus predicate into a program; 
Figure 5: illustrates splitting variables; 
Figure 6: illustrates merging variables; 

25 

Figure?: illustrates the conversion of a code section into a different virtual 
machine code; 

Rgure 8: illustrates an example of a method of the watermaridng scheme 
30 according to the present invention; 



Rgure 9: 



illustrates a possible encoding method for emt>edding a number in the 
topology of a graph; 
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Figure 10: illustrates another possibi mbodim nt for embedding a number in 
th topology of a graph; 

Figure 1 1 ; illustrates a marker in a graph; 

5 

Figure 1 2: illustrates examples of obfuscating transfomnations; 

Figure 13: illustrates examples of tamperproofing Java code; 

10 Figure 14: illustrates enumeration encoding in a planted plane cubic tree on 2m = 
8 nodes; and 

Figure 1 5: illustrates tamperproofing against node-splitting. 



15 Referring to Figure 1(b) a way is shown by which Bob can circumvent a 
watermarking scheme by distorting the protected object If the distortion is at "just 
the right level", O wll still be usable by Bob, but Charles will be unable to extract Vne 
watermark. In Rgure 1(9), the distortion is so severe that O is no longer functional, 
so Bob will not be able to use it, nor is he able to on-sdl it 

20 

In the present context, tamperproofing is applied in order to prevent an adversary 
from removing ihe watermark and to provide assurance to the software endniser tiiat 
the software object hasnl been tampered with. Thus the 'integrity' of the program 
may be verified. The primary aim of the present invention is to allow accurate 
25 assertion of ownership of a software object witti a secondary purpose being to 
ensure the integrity of the object. 
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It has been shown that tti re are transfbrmati ns, called obfuscating transformations, 
that will destroy almost any kind of program structure while preserving the semantics 
(operational behaviour) of the program. Other semantics preserving transformations, 
such as optimising transfomiations known from the prior art can be used to similar 
5 effect As a consequence, any software watennarking technique must be evaluated 
with respect to its resilience to attack from automatic application of semantics 
preserving transfomiations, such as obfuscation. The following discussion will survey 
obfuscating transformations tiiat can be used to destroy software watennarks. 

10 In Figure 2a a watemiark is embedded witiiin a static string. There are several ways 
of rendering watermarics unrecogisable, the most effective perhaps by converting 
static strings into a program that produces the data. As an example, consider the 
function G in Figure 3. This function was constiructed to obfuscate the strings "AAA", 
"BAAAA". and "CCB". The values produced by G are G(1)="AAA", G(2)="BAAAA", 

15 G(3)=G(5)="CCB", and G{4)="XCB-. 

In Rgure 2b Alice emt>eds a watermari^ within the program code itself. There are 
numerous ways to attack such code. Figure 4, for example, shows how it is possible 
to insert bogus predicates into a program. These predicates are called opaque since 
20 their outcome is known at obfuscation time, but difficult to deduce othenwise. Highly 
resilient opaque predicates can t>e constiucted using hard static analysis problems 
such as aliasing. 

In Rgure 2c a watemnaric is embedded witiiin the state (global, heap, and stack data, 
25 etc.) of the program as it is being run with a particular input /. Different obfuscation 
techniques can be employed to destroy this state, depending on the type of the data. 
For example, one variable can be split into several variables (Figure 5) or several 
variables can be merged into one (Figure 6). 

30 In Rgure 2d a watennaric is embedded within the trace (eiUier instinctions or 
addresses, or botti) of tiie program as it is being run witii a special input sequence / 
= /„ /2, ... /(f. In an alternative embodiment a watemiaric may be embedded within a 
series of xecution traces, said series of traces being generated as the program is 
run on a sp cial input This special input Is comprised of a s ries of one or more 

35 input sequences, where each input sequence is generated by a spedfic process 
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which may incorporate a rarTdom r pseud random numiaer g nerator. Execution 
traces have many properties that may be ol>served by a watennark recogniser R. 
One example of such a properly is "if the program passes point PI in O. then there's 
a 32% chance that it v«ll also pass point P2". Another example of such a property is 

5 the frequency at which some specific basic operation, such as addition, is performed. 
A specific collection of (one or more) such execution-trace properties is the 
watermark W. The signature property s(W) for this IV^is that all the property values 
are within some predefined tolerance. For example, we might require that our 
sample property P1-P2 have a value between 30% and 34% on a randomly- 

10 generated series of 10000 inputs (note that we would not expect to obsen/e an 
"exact match' to our 32% estimated mean-value for this property P1-P2, because 
each randomly-generated series of inputs would give us a somewhat different 
measurement for this property value). 

15 Many of the same transfonnations that can be used to obfuscate code will also 
obfuscate an instruction trace. Figure 7 shows another, more potent, transfonnation. 
The idea is to convert a section of code (Java bytecode in our case) into a different 
virtual machine code. The new code is then executed by a virtual machine interpreter 
included with the obfuscated application. The execution trace of the new virtual 

20 machine running the obfuscated program will be completely different from that of the 
original program. In Rgure 2e, a watermaric is embedded in an Easter Egg. Unless 
the code is obfuscated. Easter Eggs may be found by straightfon^vard techniques 
such as decompilation and disassembly. 

25 In this section, techniques for embedding software watemnarics in dynamic data 
structures are discussed. The inventors view these techniques as the most 
promising for vwthstanding de-watermaridng attacks by obfuscation. 

The basic stiructure of the proposed watermaricing technique is outlined in Rgure 8. 
30 The method is as follows: 

1 . The watenmari^ W is embedded, not in tiie static structure of the program, its 
code (Unix text segm nt), its static data (Unix initialised data segment), or its 
type infomiation (Unix symbol segment or Java's Constant Po I), but rattier in 
35 the state of the program as it is being run with a partioilar input sequence / 
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(of length k) whose elements are / = /j... Of course k may be 0, in which 
caseth reisn input and the input sequence is empty. 



2. More specifically, the watennark is embedded in the topology of a 
5 dynamically built graph staicture. It is believed that obfuscating the topology 

of a graph is fundamentally more difficult than obfuscating other types of 
data. Moreover, it is antidpated that tamperproofing such a structure should 
be easier than tamperproofing code or static data. This is particularly true of 
languages like Java, where a program has no direct access to its own code. 

10 0 

3. A Recogniser R is built along with the input / and watemnark W. R \s a 
function that is able to identify and extract the watermark graph from among 
all other dynamic allocated data structures. Since, in general, sub-graph 
isomorphism is a difficult problem, it is possible that Wwill have some special 

15 marker that will allow R to recognise W easily. Alternatively, W may be 

formed immediately after input U is processed, i.e. markers may not be 
necessary. Markers are considered 'unstealthy* for the following reason. If a 
marker is easily recognisable by a recogniser, an adversary might discover it 
- perhaps by way of a collusive attack on a collection of fingerprinted objects. 

20 The use of markers can be avoided by exploiting the recogniser's knowledge \^ 

of the secret input sequence in the following way: frie watermark will be 
completed immediately after the 1^* input (1^) of this sequence is presented to 
the program. The recogniser knows the value of "k" and therefore is able to 
look for the watennark graph effectively, by examining the nodes that were 

25 allocated or modified during the processing of In contrast, the adversary 

would be unaware of the length of this sequence and would therefore have to 
"guess' a value of 'k" as well as the values (l^, la— U in the input sequence I. 
before looking for the watermark. 



30 4. An important aspect of the proposed technique is that R is not distributed 
along with the rest of the program. If it were, a potential adversary could 
identify and decompile it, and discover the relevant property of W. R is 
mployed only when we chedc for the watennark. R may be an extension of 
the program comprised of self-monitoring code, or it may be an adjunct to a 

35 debugger or some other extemal means for examining the dynamic state of 
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the program. R may be linked in dynamically with the program when we 
check for the watermark. Other mechanisms are nvisaged by which th 
recogniser R may observe the state of the object O. 

5 5. It is required that some signature property s(W) of IV be highly resilient to 
tampering. This can be achieved, for example, by obfuscatlon or by adding 
tamperproofing code to the application. 

6. In Figure 8 it is assumed that the signature that R checks for is a number n, 
10 which has been embedded in the topology of IV. n is the product of two large 

primes P and Q. To prove the legal origin of the program, we link in R run 
the resulting program with / as input, and show that we can factor the number 
that R produces. Alternatively, s[W) can be based on hard computational 
problems other than factorisation of large integers. 

15 

The above issues will now be discussed in more detail. The first problem to be 
solved is how to embed a number in the topology of a graph. There are a number of 
ways of doing this. and. in fact, a watermaridng tool should have a library of many 
such techniques to choose from. Figure 9 illustrates one possible encoding. The 

20 structure is basically a linked list with an extra pointer field which encodes a base-6 
digit A null-pointer encodes a 0. a self-pointer a 0, a pointer to the next node 
encodes a 1 , etc. A further example is shown in figure 14 whereby the watermartc W 
is chosen firom a dass of graphs G wherein each member of G has one or more 
properties (in figure 14 - planarity) that may be tested by integrity-checking software. 

25 The integrity checking software may be incorporated into ttie program during the 
watermarking process. 

In the previous paragraph, it was shown how an integer n could be encoded in the 
topology of a graph. The encoding is resilient to tampering, as long as the 
30 recogniser R is able to correctly klentify the nodes containing the two pointer fields in 
which we have encoded n. We now describe another encoding showing that a 
recogniser R can evaluate n if it can klentify only a single pointer field per node. 



35 



Using a single pointer per nod , we can construct a watermark W in the form of a 
parent-pointer tre . The parent-point r tree W is a repres ntatran of a graph G 
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known as an ori nted tree enum rable by the techniques described in Knuth, Vol I 
3"* Edition. Section 2.3.4.4- 

The number of oriented trees with m nodes is asymptotically = c(1/ay*'Vrf^ + 
5 0{(1/af /rf^} for c - 0.44 and Va - 2.956. Thus we can encode an arbitrary 1000- 
bit integer n in a graphic watemiaric \N with lOOO/logz 2.956 - 640 nodes. 

We construct an index n for any enumerable graph in the usual way. that is, by 
ordering the operations in the enumeration. For example, we might index the trees 

10 with m nodes in 'largest subtree firsT order, in which case the path of length m-l 
would be assigned index 1. Indices 2 through a^_ , would be assigned to the other 
trees in which there is a single subtree connected to the root node. Indices a^ - 1 +1 
through a„^i + a^-j would be assigned to the trees with exactly two subtrees 
connected to the root node, such that one of the subtrees has exactly m-2 nodes. 

15 The next a^aas = a„, _ i indices would be assigned to trees with exactly two subtrees 
connected to the root node, such that one of the subtrees has exactly /n-3 nodes. 
See Figure 10 for an example. 

To aid the recognition of a watermark, the recogniser may use secret knowledge of a 
20 "signar indicating that the next thing that follows" is the real waterniaric In a 
preferred embodiment, the secret is the input sequence /; the recogniser (but not the 
attacker) knows that the watermaric will be constructed after the input sequence / = 
/t, ... /* has been processed. In an alternative, but less preferred embodiment, the 
secret is an easily recognisable •maricer' that may be present in the watermaric 
25 graph. This is similar to the signals used between baseball coaches and their 
players. See Figure 1 1 for an example. 

One advantageous consequence of the present approach is that semantics- 
preserving transformations, such as those employed in optimising compilers and 
30 those employed by obfuscation techniques which target code and static data will 
have no effect on the dynamic structures that are being built There are, however, 
other techniques whk*i can obfuscate dynamic data, and which we will need to 
tamperproof against. There are three types of obfuscating transformations which will 
need to be protected against 
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1. An adversary can add xtra pointers to the nodes of linked structures. This 
will mak It hard for R to recognise the real graph within a lot of xtra bogus 
pointer fields. 



5 2. An adversary can rename and reorder the fields in the node, again making it 
hard to recognise the real watermark. 

3. Finally, an adversary can add levels of indirection, for example by splitting 
nodes into several linked parts. 

10 

These transfonnations are illustrated in Figure 12. It is important to note here that 
obfuscating linked struchjres has some potentially serious consequences. For 
example, splitting nodes will increase the dynamic memory requirement of the 
program (each cell carries a certain amount of overhead for type information etc.). 
15 which could mean that a program which ran on, say, a machine with 32M of memory 
wouW now not mn at all. Furthermore, if we assume that an adversary does not 
know in which dynamic structure our watemriark is hidden, he is going to have to 
obfuscate every dynamic memory allocation in the entire program. 



20 Next will be discussed tech mques for tamperproofing a dynamic waterm ark^aqainsL- 
the obfuscation attacks outlined above. 

The types of tamperproofing techniques that will be available will depend on the 
nature of the distributed object code. If the code is strongly typed and supports 

25 reflection (as is the case with Java bytecode) we can use these reflection capabilities 
to construct the tamperproofing code. If, on the ottier hand, tiie application is shipped 
as stripped, untyped, native code (as is the case witii most programs written in C, for 
example) tills possibility is not open to us. Instead, we can insert code which 
manipulates the dynamically allocated structures in such a way that obfuscating 

30 them would be unsafe. 

ANSI Cs address manipulation facilities and limited reflection capabilities allow for ^ 
some trivial tamperproofing checks: 



35 



include <stdlib.h> 
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indude <stddef.h> 
struct s int a; int b;; 
void main 0 

if (offeetof(struct a)> 
5 offsetof{struct s» b)) dieQ; 

if (sizeof{struct s) != 8) dieQ; 

} 

These tests will cause the program to terminate if the fields of the structure are 
10 reordered, or the structure is split or augmented. 

Figure 13 (a) shows how Java's reflection package allows us to perform 

similar tamperproofing checks. Note that this example code is not completely 

general, 

15 since Java does not spedfy the relative order of dass fields. 

Figure 13 (b) shows how we can also use opaque predicates and variables to 
construct code which appears to (but in fact, does not) perform '^unsafe" operations 
on graph nodes. A de-watemnarking tool will not be able to statically detennine 

20 whether it is safe to apply optimising or obfuscating transformations on the code. In 
the example in Figure 13 (b). V is an oj^que string variable whose value is "car", 
although this is difficult for a de^watermaricer to woric out statically. At 1 it appears as 
if some or all (unknown to the de^temiaricer) field is being set to null, alOnough this 
will never happen. The statement 2 is a redundant operation performing n.car = 

25 n.car, although (due to the opaque variable R whose value is always 1) this cannot in 
general be worked out statically. 

For increased obscurity, the code to build the watemiark should be scattered over 
the entire application. The only restriction is that when the end of the input sequence 
30 /=/f... I„ is reached, ttie watermaric W has been constructed. This watermaric in a 
preferred embodiment may be composed of some or all of the components kV„... 

that were constructed previously. Additionally, in a preferred embodiment, some 
compon nts W, may be composed of some of all components constructed before Wf, 




35 
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if (input = /,)W,= ..; 
if(input = /2)VV2=...; 

5 lf(input = U w*.t= -..: 
if (input = /jJM^=...; 

In order to identify the watermark structure, the recogniser must be able to 
enumerate all dynamically allocated data structures. If this is not directly supported 
10 by the runtime environment (as. for example, is the case with Java), we have two 
choices. We can either rewrite the runtime system to give us the necessary 
functionality or we can provide our own memory allocator. Notice, though, that this is 
only necessary when we are attempting to recognise the watemnark. Under normal 
circumstances the application can run on the standard runtime system. 

15 

A further technique is shown in figure 15. Here is illustrated a technique wrhich 
applies a local transformation, thereby tamperproofing the watermark against an 
attack by node-splitting. Each of the nodes of the original watermari< graph is 
expanded into a 4-cycle. If an adversary splits two nodes, the underiying structure 
20 ensures that these node will fall on a cycle. At (3) the recogniser shrinks the 
biconnected components of the underiying graphs with the result that the graph is 
isomorphic to the original watermark. 

It is envisaged that local transfonnations, other than expansion of nodes into cycles, 
25 may be employed to tamperproof the watermark against specific attackes other than 
node-splitting. For example, redundant edges may be introduced into the watemnaric 
in order to render the watermaric tamperproof to specific attacks which involve the 
renaming and reordering of fields in nodes. 

30 A number of techniques are known in the prior art for hiding copyright notices in the 
object code of a program. It is the inventors' beWei that such methods are not 
resilient to attack by obfuscation - an adversary can apply a series of 
transformations that will hide or obscure the watermaric to the extent that it can no 
long r be reliably retrieved. 

35 
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Th pres nt invention indicates that the most reliable place to hide a wat rmark is 
within the dynamically allocated data structures of the program, as it is being 
executed with a particular input sequ nee. 

5 A further application for the watermarking technique described above may be in 
"fingerprinting" software. In this case, each individual program (i.e. every distributed 
copy of the code) is watermariced with a different watermaric. Although there is a risk 
of an adversary collusively attacking the watermaric. the applicant believes that 
applying obfuscation may render it very difficult for the attacker to interpret the 

10 evidence obtained by a collusive attack. 

Where in the foregoing description reference has been made to elements or integers 
having known equivalents, then such equivalents are included as if they were 
individually set forth. 

15 

Although the invention has been described by way of example and wiUi reference to 
particular embodiments, it is to be understood that modifications and/or 
improvements may be made without departing from tiie scope or spirit of the 
invention. 

20 
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5 CLAIMS: 

1 . A method of watermarking a software object whereby a watermark is stored in 
the state of the software object as it is being run with a particular input 
sequence. 

10 

2. A method as claimed in daim 1 wherein the software object may be a program 
or piece of program. 

3. A method as claimed in any one of claims 1 or 2 wherein the state of the 
15 software object may correspond to the current values held in the stack, heap, 

global variables, registers, program counter and the like. 

4. A method as claimed in any preceding daim wherein the watermark is stored in 
an object's execution state whereby an input sequence / is constructed which, 

20 when fed to an application of which the object is a part, will make the object O 

enter a state which represents the watenmark, the representation being 
validated or checked by examining the stack, heap, global variables, registers, 
program counter and the like, of the object O. 

25 5. A method as daimed in any one of daims 1 or 2 wherein the watermaric is 
embedded in the execution trace of ttie object O whereby, as a spedal input / 
is fed to O. the address/operator trace is monitored and, based on a property 
of the trace, a watenmaric is extracted. 

30 6. A method as claimed in any one of daims 1 to 4 wherein the watemiark is 
embedded in the topology of a dynamically built graph structure. 

7. Am thod as daimed in daim 6 wherein the graph strudure (or watenmaric 
graph) corresponds to a representation of the data structure of the program 
35 and may be viewed as a set of nodes together with a set of vertices. 
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8. A method as daimed in any preceding daim further comprising building a 
recognizer R concurrently with the input / and watermark W. 

9. A method as daimed in daim 8 wherein /? is a function adapted to identify and 
extract the watemnark graph from all other dynamically allocated data 
structures, 

10. A method as daimed in either daim 8 or 9 wherein the watermark W 
incorporates a marker that will allow R to recognize it easily. 

11. A method as daimed in any one of daims 8 to 10 wherein R is retained 
separately from the program and whereby R inspects the state of the program. 

12. A method as daimed in any one of daims 8 to 11 wherein R is dynamically 
linked with the program when it is checked for the existence of a watermark. 

13. A method as daimed in any preceding daim wherein the application of which 
the object forms a part is obfuscated or incorporates tamper-proofing code. 

14. A method as daimed in any one of daims 8 to 12 wherein R checks Wfor a 
signature property s(W), 

15. A method as daimed in daim 14 induding the creation of a number n which 
may be embedded in the topology of W, whereby the signature property may 
be evaluated by testing one or more numeric properties of n. 

16. A method as daimed in daim 15 wherein the signature property is evaluated 
by testing whether n is the product of two primes. 

17. A method of verifying the integrity or origin of a program comprising: 
embedding a watemnari^ W in the state of a program as the program is being 
run with a particular input sequence /; 

buiWing a recognizer R concurrently with the input / and watermaric W wherein 
the recognizer is adapted to extract the watennaric graph from other dynamic 
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Structures on th heap or stack wherein R is kept separately from the program; 
wherein R is adapted to check for a numl)er n. n, in a prefemed mbodiment 
being the product of two primes and wherein n is emt)edded in the topology of 

W. 

18. A method as claimed in claim 17 where other properties of s(W) are used to 
compute the signature. 

19. A method as claimed in ^er claim 17 or 18 wherein the numl>er n is derived 
from any combination of numbers depending on the context and application. 

20. A method as daimed in any one of claims 17 to 19 wherein the program or 
code is further adapted to be resistant to tampering, preferably by means of 
obfuscation or by adding tamper-proofing code. 

21 . A method as claimed in any one of claims 17 to 20 wherein the recognizer R 
checks for the effect of the watermartcing code on the execution state of the 
application thereby preserving the ability to recognize the watemiaric in cases 
where semantics-preserving transformations have been applied to the 
application. 

22. A method of watermaridng software including the steps of. 
embedding a watermark in a static string; and 

applying an obfuscation technkjue whereby this static string is converted Into 
executable code. 

23. A method of fingerprinting software wherein a plurality of watermariced 
programs obtained as claimed in any preceding claim are produced. 

24. A method of fingerprinting software as daimed in daim 23 wherein the 
watermariced programs each of which has a number n witti a common prime 
factor p. 
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25. A m thod of wat rmaridng software wherein the wat rmark Wis chosen from 
a dass f graphs G wherein each member of G has ne or more properties, 
such as pianarity. said property being capable of being tested by integrity- 
testing software. 

5 

26. A method of watemnaridng software as claimed in claim 25 wherein the 
watermark may rendered tamperproof to certain transfomiations by 
subjecting the watemiark graph to one or more local transformations. 

10 27. A method of watennarWng software as claimed in claim 26 wherein each 
node of the watermark graph is expanded into a cycle, 

28. A method substantially as herein described with reference to the drawings, 

15 29. Software written to perform the method as claimed in any preceding claim, 

30. A computer programmed to perfonn the method as claimed in any one of 
claims 1 to 27. 



20 
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